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Abstract. We propose efficient preconditioning algorithms for an eigenvalue problem arising 
in quantum physics, namely the computation of a few interior eigenvalues and their associated 
eigenvectors for the largest sparse real and symmetric indefinite matrices of the Anderson model 
of localization. We compare the Lanczos algorithm in the 1987 implementation by CuUum and 
Willoughby with the shift-and-invert techniques in the implicitly restarted Lanczos method and in 
the Jacobi-Davidson method. Our preconditioning approaches for the shift-and-invert symmetric 
indefinite linear system are based on maximum weighted matchings and algebraic multilevel incom- 
plete LDL^ factorizations. These techniques can be seen as a complement to the alternative idea of 
using more complete pivoting techniques for the highly ill-conditioned symmetric indefinite Anderson 
matrices. We demonstrate the effectiveness and the numerical accuracy of these algorithms. Our nu- 
merical examples reveal that recent algebraic multilevel preconditioning solvers can accelerative the 
computation of a large-scale eigenvalue problem corresponding to the Anderson model of localization 
by several orders of magnitude. 
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1. Introduction. One of the hardest challenges in modern eigenvalue computa- 
tion is the numerical solution of large-scale eigenvalue problems, in particular those 
arising from quantum physics such as, e.g., the Anderson model of localization (see 
Section |31 for details). Typically, these problems require the computation of some 
eigenvalues and -vectors for systems which have up to several million unknowns due 
to their high spatial dimensions. Furthermore, their underlying structure involves 
random perturbations of matrix elements which invalidates simple preconditioning 
appraoches based on the graph of the matrices. Moreover, one is often interested 
in finding some eigenvalues and associated eigenvectors in the interior of the spec- 
trum. The classical Lanczos approach [51] has lead to eigenvalue algorithms [16, 17] 
that are in principle able to compute these eigenvalues using only a small amount 
of memory. More recent work on implicitly started Lanczos techniques [42, 43] has 
accelerated these methods significantly, yet to be fast one needs to combine this ap- 
proach with shift-and-invert techniques, i. e. in every step one has to solve a shifted 
system of type A — al, where cr is a shift near the desired eigenvalues and A G R"'", 
A — A^ is the associated matrix. In general shift-and-invert techniques converge 
rather quickly which is inline with the theory [51]. Still, an efficient solver is required 
to solve systems {A — al)x = b efficiently with respect to time and memory. While 
implicitly restarted Lanczos techniques [42, 43] usually require to solve the system 
{A — al)x = b to maximum precision and thus are mainly suited for sparse direct 
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solvers, the Jacobi-Davidson method has become an attractive ahernative [61] in 
particular when dealing with preconditioning methods for linear systems. 

Until recently, sparse symmetric indefinite direct solvers were still far off from 
symmetric positive definite solvers and this might have been one major reason why 
shift-and-invert techniques were not able to compete with traditional Lanczos tech- 
niques [27], in particular because of memory constraints. With the invention of 
fast matchings-based algorithms [49] which improve the diagonal dominance of linear 
systems the situation has dramatically changed and the impact on preconditioning 
methods [7] as well as the benefits for sparse direct solvers [58] has been recognized. 
Furthermore, these techniques have been successfully transferred to the symmetric 
case [22,24] allowing modern state-of-the-art direct solvers [57] to be orders of mag- 
nitudes faster and more memory efficient than ever, finally leading to symmetric 
indefinite sparse direct solvers that are almost as efficient as their symmetric positive 
definite counter parts. Recently this approach has also been utilized to construct in- 
complete factorizations [38] with similarly dramatic success. For a detailed survey on 
preconditioning techniques for large symmetric indefinite linear systems the interested 
reader should consult [5,6]. 

2. Numerical approach for large systems. In the present paper we combine 
the above mentioned advances with inverse-based preconditioning techniques [8] . This 
allows us to find interior eigenvalues and -vectors for the Anderson problem several 
orders of magnitudes faster than traditional algorithms [16,17] while still keeping the 
amount of memory reasonably small. 

Let us briefly outline our strategy. We will consider recent novel approaches 
in preconditioning methods for symmetric indefinite linear systems and eigenvalue 
problems and apply them to the Anderson model. Since the Anderson model is a large- 
scale sparse eigenvalue problem in three spatial dimensions, the eigenvalue solvers we 
deal with are designed to compute only a few interior eigenvalues and eigenvectors 
avoiding a complete factorization. In particular we will use two modern eigenvalue 
solvers which we will briefly introduce in Section 01 The first one is Arpack [42] , 
which is a Lanczos- type method using implicit restarts (cf. section ETl . We use this 
algorithm together with a shift-and-invert technique, i. e. eigenvalues and eigenvectors 
of {A — al)^^ are computed instead of those of A. Arpack is used in conjunction 
with a direct factorization method and a multilevel incomplete factorization method 
for the shift-and-invert technique. 

Firstly, we use the shift-and-invert technique with the novel symmetric indefinite 
sparse direct solver that is part of Pardiso [57] and we report extensive numerical 
results on the performance of this method. Section [3 will give a short overview 
of the main concepts that form the Pardiso solver. Secondly, we use Arpack in 
combination with the multilevel incomplete LU factorization package Ilupack [9]. 
Here we present a new indefinite version of this prcconditioner that is devoted to 
symmetrically indefinite problems and combines two basic ideas, namely (i) symmetric 
maximum weighted matchings [22,24] and (ii) inverse-based decomposition techniques 
[8] . These will be described in Sections 15.21 and 

As a second eigenvalue solver we use the symmetric version of the Jacobi- 
Davidson method, in particular the implementation Jdbsym [32]. This Newton-type 
method (see section is used together with Ilupack [9]. As we will see in several 
further numerical experiments, the synergy of both approaches will form an extremely 
efficient prcconditioner for the Anderson model that is memory efficient while at the 
same time accelerates the eigenvalue computations significantly: system sizes that 
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resulted in weeks of computing time [27] can now be computed within an hour. 

3. The Anderson model of localization. The Anderson model of localization 
is a paradigmatic model describing the electronic transport properties of disordered 
quantum systems [41,54]. It has been used successfully in amorphous materials such 
as alloys [52], semiconductors and even DNA [53]. Its hallmark is the prediction of 
a spatial confinement of the electronic motion upon increasing the disorder — the 
so-called Anderson localization [2]. When the model is used in 3 spatial dimensions, 
it exhibits a metal-insulator transition in which the disorder strength w mediates a 
change of transport properties from metallic behavior at small w via critical behavior 
at the transition Wc and on to insulating behavior and strong localization at larger 
w [41,54]. Mathematically, the quantum problem corresponds to a Hamilton operator 
in the form of a real symmetric matrix A, with quantum mechanical energy levels given 
by the eigenvalues {A}, and the respective wave functions are simply the eigenvectors 
of A, i.e. vectors x with real entries. With N = M x M x M sites, the quantum 
mechanical (stationary) Schrodinger equation is equivalent to the eigenvalue equation 
Ax = Ax, which in site representation reads as 

with i,j,k ~ 1, 2, . . . , M denoting the Cartesian coordinates of a site. The disorder 
enters the matrix on the diagonal where the entries Si-j-k correspond to a spatially 
varying disorder potential and are selected randomly according to a suitable distribu- 
tion [40]. Here, we shall use the standard box distribution Stj-k & [—w/2,w/2] such 
that the w parameterizes the aforementioned disorder strength. Clearly, the eigenval- 
ues of A then lie within the interval [— 6 — w/2, 6+'w/2]. In most studies of the induced 
metal-insulator transition, w ranges from 1 to 30 [54]. But these values also depend 
on whether generalizations to random off-diagonal elements [26,63] — the so-called 
random-hopping problem — , anisotropics [45,48] or other lattice graphs [36,60] are 
being considered. 

The intrinsic physics of the model is quite rich. For disorders w 16.5, the 
eigenvectors are extended, i.e. Xi-j-k is fluctuating from site to site, but the envelope 
\x\ is approximately a nonzero constant. For large disorders w > 16.5, all eigenvectors 
are localized such that the envelope |a:„| of the nth eigenstate may be approximately 
written as exp— [r — r„]/Z„(?i;) with r = {i,j,k)'^ and /^(w) denoting the localiza- 
tion length of the eigenstate. In Figure ITTl we show examples of such states. Note 
that ]a;]^ and not x corresponds to a physically measurable quantity and is therefore 
the observable of interest to physicists. Directly a,t w = Wc ^ 16.5, the extended 
states at A = vanish and no current can flow. The wave function vector x appears 
simultaneously extended and localized as shown in Fig. 13.21 

In order to numerically distinguish these three regimes, namely, localized, critical, 
and extended behaviors, one needs to (i) go to extremely large system sizes of order 10^ 
to 10^ and (ii) average over many different realizations of the disorder, i.e., compute 
eigenvalues or eigenvectors for many matrices with different diagonals. In the present 
paper, we concentrate on the computation of a few eigenvalues and corresponding 
eigenvectors for the physically most interesting case of critical disorder Wc and in the 
center of cr{A), i.e., at A = 0, for large system sizes [3,10,47,64]. Since there is a 
high density of states for a{A) at A = in all cases, we have the further numerical 
challenge of clearly distinguishing the eigenstates in this high density region. 

3.1. Lanczos algorithm and the Cullum-Willoughby implementation. 

Since the mid-eighties, the preferred numerical tool to study the Anderson matrix and 
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Fig. 3.1. Extended (left) and localized (right) wave function probabilities for the 3D Anderson 
model with periodic boundary conditions at X = with N = 100^ and w = 12.0 and 21.0, respectively. 
Every site with probability \xj\^ larger than the average is shown as a box with volume \xj\^N. 

Boxes with \xj\^N > VlOOO are plotted with black edges. The color scale distinguishes between 
different slices of the system along the axis into the page. 

to compute a selected set of eigenvectors, e.g. as needed for a multifractal analysis at 
the transition, was provided by the CuUum-Willoughby implementation (Cwi) [16-18] 
of the Lanczos algorithm. The algorithm iteratively generates a sequence of orthogonal 
vectors Vi, i = 1, . . . K, such that V^AVk — Tk, with V — [vi,V2, ■ ■ ■ vk] and Tk a 
symmetric tridiagonal K x K matrix. In exact arithmetic, the recursion 

l3i+iVi+i = Avi - aiVi - PiV^-i, (3.2) 

where ai = vf Avi and = TJi+iAui are the diagonal and subdiagonal entries of Tk 
and f = and vi is an arbitrary starting vector, is an orthogonal transformation to 
tridiagonal form that needs K = N matrix-vector multiplications. The eigenvalues of 
the tridiagonal matrix Tk (Ritz values) are then simply the eigenvalues of the matrix 
A and the associated Ritz vectors yield the eigenvectors. 

Since A is sparse and symmetric, the underlying matrix-vector multiplication on 
the Cwi can be programmed very efficiently, either by directly coding or appropriate 
sparse storage schemes — only the diagonal needs to be stored in any case. Ad- 
ditionally, the Cwi is a Lanczos implementation in which no reorthogonalization is 
performed. Rather, spurious eigenvalues are identified by extending the set of Ritz 
vectors to more than the TV present in exact arithmetic. Typically, we find that 
K w AN is sufficient for all "good" eigenvalues to have replicated themselves at least 
twice — a further sign that the algorithm has converged. Clearly, this strategy work 
well in the present case since the disorder destroys any symmetry-induced degenera- 
cies. 

The Cwi is memory efficient and does not need elaborate reorthogonalization 
schemes, but does need to construct many Ritz vectors which is computationally in- 
tensive. Nevertheless, in 1999 Cwi was still significantly faster than more modern 
iterative schemes [27]. The main reason for this surprising result lies in the indefinite- 
ness of A, which led to severe difficulties with solvers more accustomed to standard 
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Fig. 3.2. Plot of the electronic eigenstate at the metal- insulator transition with E = 0, w = 16.5 
and N = 250^. The box-and-color scheme is as in Fig. l^.A Note how the state extends nearly 
everywhere while at the same time exhibiting certain localized regions of higher \xj\^ values. 

Laplacian-type problems. 

4. Modern approaches for solving symmetric indefinite eigenvalue prob- 
lems. When dealing with eigenvalues near a given real shift a, the Lanczos algo- 
rithm [51] is usually accelerated when being applied to the shifted inverse {A — crl)^^ 
instead of A directly. This approach relies on the availability of a fast solution method 
for linear systems of type {A — al)x = b. However, the limited amount of available 
memory only allows for a small number of solution steps and sparse direct solvers also 
need to be memory-efficient to turn this approach into a practical method. 

The limited number of Lanczos steps has lead to modern implicitly restarted 
methods [43, 62] which ensure that the information about the desired eigenvalues 
is inherited when being restarted. With increasing number of preconditioned itera- 
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tive methods for linear systems [55], Lanczos-type algorithms have become less at- 
tractive mainly because in every iteration step the systems of type (^4 — i7l)x = b 
have to be solved to full accuracy in order to avoid false eigenvalues. In contrast 
to this, JACOBI-DAVIDSON-Iike methods [61] allow using a crude approximation of 
the underlying linear system. From the point of view of linear solvers as part of the 
eigenvalue computation, modern direct and iterative methods need to inherit the sym- 
metric structure A = while remaining both time and memory efficient. Symmetric 
matching algorithms [22, 24, 57] have significantly improved these methods. 

4.1. The shift-and-invert mode of the restarted Lanczos method. The 

Lanczos method for real symmetric matrices A near a shift a is based on computing 
successively orthonormal vectors [vi,. . . , Vk, Vk+i] and a tridiagonal 1) x fc matrix 

/ ai Pi 
/?fc-i Qfc 

V Pk 

where is the kth unit vector in M*^, such that 

{A-aI)-'^[vi,...,Vk\ = [vi,...,Vk,Vk+i]Tk. (4.2) 

Since only a limited mmiber of Lanczos vectors vi, . . . ,Vk can be stored and since 
this Lanczos sequence also consists of redundant information about undesired small 
eigenvalues, implicitly restarted Lanczos methods have been proposed [43, 62] that 
use implicitly shifted QR [35] exploiting the small eigenvalues of Tk to remove them 
out of this sequence without ever forming a single matrix vector multiplication with 
{A — al)~^. The new transformed Lanczos sequence 

{A-aI)-^[vi,...,ii] = [ii,...,vi,vi+i]J] (4.3) 

with I <C k then allows to compute further k — I approximations. This approach is at 
the heart of the symmetric version of Arpack [42,43]. 

4.2. The symmetric Jacobi-Davidson method. One of the major draw- 
backs of shift-and-invert Lanczos algorithms is the fact that the multiplication with 
{A — al)~^ requires solving a linear system to full accuracy. In contrast to this, 
JACOBI-DAVIDSON-Iike algorithms [61] are based on a Newton-like approach to solve 
the eigenvalue problem. Like the Lanczos method the search space is expanded step 
by step solving the correction equation 

{I - uu^){A- ei){I -uu^)z = -r such that z = {I - uu^)z (4.4) 

where (m, 0) is the given approximate eigenpair and r = Au — 6u is the associated 
residual. Then the search space based on Vfe = [f i , . . . , Vk\ is expanded by reorthog- 
onalizing z with respect to [wi, . . . , Vk\ and a new approximate eigenpair is computed 
from the Ritz approximation [Vfe, z]'^ A[Vk, z\. When computing several right eigenvec- 
tors, the projection / — uu^ has to be replaced by / — [Q, m][(5, using the already 
computed approximate eigenvectors Q. This ensures that the new approximate eigen- 
pair is orthogonal to those that have already been computed. 



Tk 



(4.1) 
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The most important part of the Jacobi-Davidson approach is to construct an 
approximate solution for H4.4|l such that 

(/ - uu^)K{I - uu^)c = d with z = (4.5) 

and K K A ~ 91 that allows a fast solution of the system Kx = b. Here, there is a 
strong need for robust preconditioning methods that preserve symmetry and efficiently 
solve sequences of linear systems with K. If K is itself symmetric and indefinite, then 
the simplified QMR method [29,30] using the preconditioner ^/ — ^j^^ K~^, where 

Kw = u and the system matrix (/ — uu^) {A — 91) can be used as iterative method. 
Note that here the accuracy of the solution of (|4.4|l is uncritical until the approximate 
eigenpair converges [28]. This fact has been exploited in Jdbsym [4,32]. For an 
overview on Jacobi-Davidson methods for symmetric matrices see [33]. 

5. On recent algorithms for solving symmetric indefinite systems of 
equations. We now report on recent improvements in solving symmetric indefinite 
systems of linear equations that have significantly changed sparse direct as well as 
preconditioning methods. One key role that made these approaches successful is 
played by the use of symmetric matchings that we review in Section 15.21 

5.1. Sparse direct factorization methods. For a long time dynamic pivoting 
has been a central point for nonsymmetric sparse linear solvers to gain stability. 
Therefore, improvements in speeding up direct factorization methods were limited 
to the uncertainties that have arisen from using pivoting. Certainly techniques like 
the column elimination tree [19, 34] have been a useful tool to predict the sparsity 
pattern despite pivoting. However, in the symmetric case the situation becomes more 
complicated since only symmetric reorderings applied to both, columns and rows, are 
required and no a-priori choice of pivots is given. This makes it almost impossible to 
predict the elimination tree in a sensible manner and the use of cache-oriented level-3 
BLAS [20,21] was impossible. 

With the introduction of symmetric maximum weighted matchings [22] as alter- 
native to complete pivoting it is now possible to treat symmetric indefinite systems 
almost similar to symmetric positive definite systems. This allows the prediction of 
fill using the elimination tree [31] and thus to set up the data structures that are re- 
quired to predict dense submatrices (also known as supernodes). This in turn means 
that one is able to exploit level-3 BLAS applied to the supernodes. Consequently, the 
classical Bunch-Kaufman pivoting approach [12] need to be performed only inside the 
supernodes. 

This approach has recently been successfully implemented in the sparse direct 
solver Pardiso [57] and as a major consequence, this novel approach has improved 
the sparse indefinite solver to become almost as efficient as its symmetric positive 
analogy. Certainly for the Anderson problem studied here, Pardiso is about 2 orders 
of magnitude more efficient than previously used direct solvers [27]. We also note 
that the idea of symmetric weighted matchings can be carried over to incomplete 
factorization methods with similar success [38]. 

5.2. Symmetric weighted matchings as an alternative to complete piv- 
oting techniques. Symmetric weighted matchings [22,24], which will be explained 
in detail in Scction f6.2l can be viewed as a preprocessing step that rescales the original 
matrix and at the same time improves the block diagonal dominance. By this strat- 
egy, all entries are at most one in modulus and in addition the diagonal blocks are 
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either 1x1 scalars an such that |ajj| = 1 (in exceptional cases we will have an = 0) 
or they are 2x2 blocks 



Although this strategy does not necessarily ensure that symmetric pivoting like in 
Ref. [12] is unnecessary, it is nevertheless likely to waive dynamic pivoting during the 
factorization process. It has been shown in [24] that based on symmetric weighted 
matchings the performance of the sparse symmetric indefinite multifrontal direct solver 
MA57 is improved significantly, although a dynamic pivoting strategy by Duff and 
Reid [25] was still present. Recent results in [57] have shown that the absence of 
dynamic pivoting does not harm the method anymore and that therefore symmetric 
weighted matchings can be considered as alternative to complete pivoting. 

6. Symmetric reorderings to improve the results of pivoting on re- 
stricted subsets. In this section we will discuss weighted graph matchings as an 
additional preprocessing step. The motivation for weighted matching approaches is 

to identify large entries in the coefficient matrix A that, if permuted close to the 
diagonal, permit the factorization process to identify more acceptable pivots and pro- 
ceed with fewer pivot perturbations. These methods are based on maximum weighted 
matchings A4 and improve the quality of the factor in a complementary way to the 
alternative idea of using more complete pivoting techniques. The idea to use a per- 
mutation associated with a weighted matching M as an approximation of the 
pivoting order for nonsymmctric linear systems was firstly introduced by Olschowka 
and Neumaier [49] and extended by Duff and Koster [23] to the sparse case. Per- 
muting the rows A <— Pm^ of the sparse system to ensure a zero- free diagonal or 
to maximize the product of the absolute values of the diagonal entries are techniques 
that are now often regularly used for nonsymmctric matrices [7,46,58,59]. 

6.1. Matching algorithms for nonsymmetric matrices. Let A = (aij) € 
j^rtxn general matrix. The nonzero elements of A define a graph with edges 
^ = {('!.?) • '^'ij 7^ 0} of ordered pairs of row and column indices. A subset A4 C £ 
is called a matching or a transversal, if every row index i and every column index j 
appear at most once in A^. A matching M. is called perfect if its cardinality is n. For 
a nonsingular matrix at least one perfect matching exists and can be found with well 
known algorithms. With a perfect matching A4, it is possible to define a permutation 
matrix Pa4 = (pij) with: 



As a consequence, the permutation matrix h&s nonzero elements on its diagonal. 

This method only takes the nonzero structure of the matrix into account. There are 
other approaches which maximize the diagonal values in some sense. One possibility 
is to look for a matrix PM^ such that the product of the diagonal values of Pm-^ is 
maximal. In other words, a permutation a has to be found, which maximizes 




such that \aii\, |ai+i,i+i| ^ 1 and |ai+i,i| = |oi,i+i| = 1. 




(6.1) 



n 



(6.2) 



i=l 
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Fig. 6.1. Illustration of the row permutation. A small numerical value is indicated by a o-symbol 
and a large numerical value by an •-symbol. The matched entries J\4 are marked with squares and 
Pm = (64; ei; es; 62; es; ee). 




This maximization problem is solved indirectly. It can be reformulated by defining a 
matrix C = {cij) with 

-log lay I fly 7^0 ^g^^ 
otherwise, 

where ai = maxj |aij|, i.e. the maximum element in row i of matrix A. A permutation 
(T, which minimizes X]r=i ^(y(i)i ^l^o maximizes the product (j6.2(l . 

The minimization problem is known as linear sum assignment problem or bipartite 
weighted matching problem in combinatorial optimization. The problem is solved by 
a sparse variant of the Kuhn-Munkres algorithm. The complexity is 0{n'^) for full 
nx n matrices and 0(nr log n) for sparse matrices with r entries. For matrices whose 
associated graph fulfills special requirements, this bound can be reduced further to 
O (n"(r + nlogn)) with a < 1. All graphs arising from finite-difference or finite- 
element discretizations meet these conditions [37] . As before, we finally get a perfect 
matching Ai that in turn defines a nonsymmetric permutation P^. 

The effect of nonsymmetric row permutations using a permutation associated 
with a matching A4 is shown in Figure IHTI It is clearly visible that the matrix Pm-^ 
is now nonsymmetric, but has the largest nonzeros on the diagonal. 

6.2. Symmetric 1x1 and 2x2 block weighted matchings. In the case of 
symmetric indefinite matrices, we are interested in symmetrically permuting PAP^ . 
The problem is that zero or small diagonal elements of A remain on the diagonal by 
using a symmetric permutation PAP^ . Alternatively, instead of permuting a largC""^ 
off-diagonal element nonsymmetrically to the diagonal, one can try to devise a 
permutation Pg such that PgAP^ permutes this element close to the diagonal. As a 

result, if we form the corresponding 2x2 block to 

diagonal entry to be large and thus the 2x2 block would from a suitable 2x2 pivot 
for the Supernode-Bunch-Kaufman factorization. An observation on how to build Ps 
from the information given by a weighted matching M. was presented by Duff and 
Gilbert [22]. They noticed that the cycle structure of the permutation Pm associated 
with the nonsymmetric matching M. can be exploited to derive such a permutation 
Ps- For example, the permutation P^ from Figure 16.11 can be written in cycle 
representation as Pq — (ei; 62; e4)(e3; e5)(e6). This is shown in the upper graphics 
in Figure HOI The left graphic displays the cycles (1 2 4), (3 5) and (6). If we modify 
the original permutation Pm = (64; ei; 65; 62; 63; eg) into this cycle permutation Pc = 



we expect the off- 



^ Large in the sense of the weighted matching M . 



10 



O. SCHENK, M. BOLLHOFER, AND R.A.ROMER 



A 



A 



Qffl 






• 


B 


o 




















o 




m 









O 














O 


m 



PcAPj = 



PsAPi 



• m o 

I o 

om 
o a 

om 
o m 



Fig. 6.2. Illustration of a cycle permutation with Pq = (ei; £2; e4)(e3; e5)(e6) ^"5 = 

(ei)(e2; e4)(e3; e5)(e6). The symmetric matching Pg has two additional elements (indicated by 
dashed boxes), while one element of the original matching fell out (dotted box). The two 2-cycles 
are permuted into 2x2 diagonal blocks to serve as initial 2x2 pivots. 



(ei; 62', e4)(e3; e5)(e6) and permute A symmetrically with PqAPq , it can be observed 
that the largest elements are permuted to diagonal blocks. These diagonal blocks are 
shown by filled boxes in the upper right matrix. Unfortunately, a long cycle would 
result into a large diagonal block and the fill-in of the factor for PqAP^ may be 
prohibitively large. Therefore, long cycles corresponding to Pm must be broken down 
into disjoint 2x2 and 1x1 cycles. These smaller cycles are used to define a symmetric 
permutation Ps = (ci, . . . , Cm), where m is the total number of 2 x 2 and 1x1 cycles. 

The rule for choosing the 2x2 and 1x1 cycles from Pq to build Pg is straight- 
forward. One has to distinguish between cycles of even and odd length. It is always 
possible to break down even cycles into cycles of length two. For each even cycle, 
there are two possibilities to break it down. We use a structural metric [24] to decide 
which one to take. The same metric is also used for cycles of odd length, but the 
situation is slightly different. Cycles of length 21 + 1 can be broken down into / cycles 
of length two and one cycle of length one. There are 21 + 1 different possibilities to do 
this. The resulting 2x2 blocks will contain the matched elements of Ai. However, 
there is no guarantee that the remaining diagonal element corresponding to the cycle 
of length one will be nonzero. Our implementation will randomly select one element 
as a 1 X 1 cycle from an odd cycle of length 2/ + 1. 

A selection of Ps from a weighted matching Pm is illustrated in Figure IfT^l The 
permutation associated with the weighted matching, which is sorted according to the 
cycles, consists of Pq — (ei; 62; e4)(e3; e5)(e6). We now split the full cycle of odd 
length three into two cycles (1)(24) — resulting in Ps = (ei)(e2; e4)(e3; e5)(e6). If 
Ps is symmetrically applied to A ^ PsAP^ , we see that the large elements from 
the nonsymmetric weighted matching M will be permuted close to the diagonal and 
these elements will have more chances to form good initial 1x1 and 2x2 pivots for 
the subsequent (incomplete) factorization. 

Good fill-in reducing orderings PpiU are equally important for symmetric indefinite 
systems. The following section introduces two strategies to combine these reorderings 
with the symmetric graph matching permutation Ps- This will provide good initial 
pivots for the factorization as well as a good fill-in reduction permutation. 
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6.3. Combination of orderings Ppiii for fill reduction with orderings 
based on weighted matchings. In order to construct the factorization efRciently, 
care has to be taken that not too much fiU-in is introduced during the ehmination 
process. We now examine two algorithms for the combination of a permutation Ps 
based on weighted matchings to improve the numerical quality of the coefhcient matrix 
A with a fill-in reordering Ppiu based on a nested dissection from Metis [39]. The 
first method is based on compressed subgraphs and has also been used by Duff and 
Pralet in [24] in order to find good scalings and orderings for symmetric indefinite 
systems. 

In order to combine the permutation Pg with a fill-in reducing permutation, we 
compress the graph of the reordered system P^APg and apply the fill-in reducing 
reordering to the compressed graph. In the compression step, the union of the struc- 
ture of the two rows and columns corresponding to a 2 x 2 diagonal block are built, 
and used as the structure of a single, compressed row and column representing the 
original ones. 

If Ga = {V; E) is the undirected graph of A and a cycle consists of two vertices 
(s, t) G V , then graph compression will be done on the 1x1 and 2x2 cycles, which have 
been found using a weighted matching AA on the graph. The vertices (s, t) are replaced 
with a single supervertex u = {s, i} £ 14 in the compressed graph Gc = (Vc, Ec). An 
edge Be = (s, t) G Ec between two supervertices s = {si, 82} G Vc and i = {ti, G Vc 
exists if at least one of the following edges exist in E : (si, ii), (si, ^2), (s2, ^i) or 
(32,^2)- The fill-in reducing ordering is found by applying Metis on the compressed 
graph Gc — (Vc, Ec). Expansion of that permutation to the original numbering yields 
Pfiii- Hence all 2 x 2 cycles that correspond to a suitable 2x2 pivot block are 
reordered consecutively in the factor. 

7. Symmetric multi-level preconditioning techniques. We now present a 
new symmetric indefinite approximate multilevel factorization that is mainly based 
on three parts which are repeated in a multilevel framework in each subsystem. The 
components consist of (i) reordering of the system, (ii) approximate factorization 
using inverse-based pivoting and, (iii) recursive application to the system of postponed 
updates. 

7.1. Reordering the given system. The key ingredient to turn this approach 
into an efficient multilevel solver consists of the symmetric maximum weight matching 
presented in Section [5. 21 After the system is reordered into a representation 

P'^DADPs = A, (7.1) 

where D,Ps G M"'", D is a diagonal matrix and Ps is a permutation matrix, A is 
expected to have many diagonal blocks of size 1 x 1 or 2 x 2 that are well-conditioned. 
Once the diagonal block of size 1x1 and 2x2 are built, the associated block graph 
of A is reordered by a symmetric reordering, e. g. Amd [1] or Metis [39], i. e. 

\l^PjDADPsn = i, 

where 11 G M"'" refers to the associated symmetric block permutation. 

7.2. Inverse-based pivoting. Given A we compute an incomplete factorization 
LDL^ = A + E A. To do this at step k of the algorithm we have 



(7.2) 
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where Lb G k'^-i-^-i is lower triangular with unit diagonal and Db € is 
block diagonal with diagonal blocks of size 1x1 and 2x2. Also, Sc — C — LpDsL^ = 
denotes the approximate Schur complement. To proceed with the incomplete 
factorization we perform either a 1 x 1 update or a 2 x 2 block update. One possible 
choice could be to use Bunch's algorithm [11]. This approach has been used in Ref. 
[38]. Here we use a simple criterion based on block diagonal dominance of the leading 
block column. Depending on the values 

i>i ' ' i>2 ^ ^ 

we perform a 2 x 2 update only if d2 < di. The leading two columns of Sc can be 
cfRcicntly computed using linked lists [44] and it is not required to have all entries of 
Sc available. 

When applying the (incomplete) factorization LDL^ to A we may still encounter 
a situation where at step k either l/lsiij or ]] (sij)7j^2 II large or even infinity. 
Since we are dealing with an incomplete factorization we propose to use inverse-based 
pivoting [8]. Therefore we require in every step that 

for a prescribed bound k. If after the update using a 1 x 1 pivot (or 2 x 2 pivot) 
the norm of the inverse lower triangular factor fails to be less than k, the update 
is postponed and the leading rows/columns of Le, Sc are permuted to the end. 
Otherwise depending on whether alxlora2x2 pivot has been selected, the entries 



(■Sji/sii)i>i. {{sji, Sj2) ^ j (7.6) 

become the next (block) column of L and we drop these entries whenever their absolute 

value is less than e/n for some threshold e. For a detailed description see Ref. [8]. 
The norm of the inverse can cheaply be estimated using a refined strategy of Ref. [15] 
and is part of the software package Ilupack that is now extended to the symmetric 
indefinite case [9]. 

7.3. Recursive application. After the inverse-based ILU we have an approxi- 
mate factorization 

and it typically does not pay off to continue the factorization for the remaining matrix 
^22 which consists of the previously postponed updates. Thus ^22 is now explicitly 
computed and the strategies for reordering, scaling and factorization are recursively 
applied to ^22 leading to a multilevel factorization. 

Note that in order to save memory L21 is not stored but implicitly approximated 
by A2i{LiiDiiUxi)^^ . In addition we use a technique called aggressive dropping that 
sparsifies the triangular factor L a posteriori. To do this observe that when applying 
a perturbed triangular factor for preconditioning instead of we have 



L-i = (/ -h El)L-\ where El = L'^L - L). 
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We can expect that serves as a good approximation to L^^ as long as *C 1- 

If we obtain L from L by dropping some entry, say lij from L, then we have to ensure 
that 

|lL-ie,|l -IZy l s;r«l, 

for some moderate constant r < 1, e.g. t = 0.1. To do this it is required to have 
a good estimate for i/^ « ||L~^ei|| available, for any i — 1, . . . ,n. In principle it can 
be computed [8, 15] using instead of L. Last, knowing how many entries exist in 
column j, we could drop any lij such that 

\k, I T/{iy, ■ #{lkj : Zfcj ^ 0, A: = J + 1, . . . , n}). 

7.4. Iterative solution. By construction, the computed incomplete multilevel 
factorization is symmetric but indefinite. For the iterative solution of linear systems 
using the multilevel factorization, in principle different Krylov subspace solvers could 
be used. For example, general methods that do not explicitly use symmetry (e.g. 
GMRES [56]) or methods like SYMMLQ [50] which preserve the symmetry of the 
orginal matrix but which are only devoted for symmetric positive definite precondi- 
tioners. To fully exploit both, symmetry and indefiniteness at the same time, here 
the simplified QMR method [29,30] is chosen. 

8. Numerical Experiments. Here we present numerical experiments that show 
that the previously outlined advances in symmetric indefinite sparse direct solvers as 
well as in preconditioning methods significantly accelerate modern eigenvalue solvers 
and allow us to gain orders of magnitude in speed compared to more conventional 
methods. 

8.1. Computing Environments and Software. All large scale numerical ex- 
periments for the Anderson model of localization were performed on an SGI Altix 
3700/BX2 with 56 Intel Itanium2 1.6 GHz processors and 112 GB of memory. If not 
explicitly stated, we always used only one processor of the system and all algorithms 
were implemented in either C or Fortran??. All codes were compiled by the Intel V8.1 
compiler suite using ifort and ice with the -03 optimization option and linked with 
basic linear algebra subprograms optimized for Intel architectures. For completeness, 
let us recall the main software packages used. 

• Arpack is a collection of Fortran?? subroutines designed to solve large scale 
eigenvalue problems. The eigenvalue solver has been developed at the De- 
partment of Computational and Applied Mathematics at Rice University. It 
is available at http://www.caain.rice.edu/software/ARPACK 

• Jdbsym is a C library implementation of the Jacobi-Davidson method 
optimized for symmetric eigenvalue problems. It solves eigenproblems of the 
form Ax = Xx and Ax = XBx with or without preconditioning, where A 
is symmetric and B is symmetric positive definite. It has been developed 
at the Computer Science Department of the ETH Zurich and is available at 
http : //people . web .psi . ch/geus/sof tware .html 

• Pardiso is a fast direct solver package, developed at the Computer Science 

Department of the University of Basel and available at http : //www . computation al . unibas . ch/ cs/ scicomp/ sof twa 

• Ilupack is an algebraic multilevel preconditioning software package. This 
iterative solver has been developed at the Mathematics Department of the 

Technical University of Berlin. It is available at,http : //www . math . tu-berlin . de/ilupackj 
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8.2. Numerical Results. In our numerical experiments we will first compare 
the classical Cwi with the shift-and-invert Lanczos method using implicit restarts. 
The latter is part of Arpack [42]. For the solution of the symmetric indefinite 
system A — 91 we use the most recent version of sparse direct solver Pardiso [57]. 
This version is based on symmetric weighted matchings and uses Metis as symmetric 
reordering strategy. The numerical results deal with the computation of 5 eigenvalues 
of the Anderson matrix A near A = 0. Here we state the results for the physically most 
interesting critical disorder strength w — Wc — 16.5 (cf. Table IHTT) . As can be seen 
from Table IHTI the PARDiSO-based shift-and-invert Lanczos is clearly superior to the 
classic Cwi method by at least one order of magnitude regarding computation time. 
Despite this success, with increasing problem size the amount of memory consumed 
by the sparse direct solver becomes significant and numerical results N larger than 
I'OOO'OOO are skipped. 

Table 8.1 

CPU times in seconds and memory requirements in GB to compute atw = 16.5 five eigenvalues 
closest to \ = of an Anderson matrix of size N = X with Cwi and Arpack-Pardiso. For 
Cwi and M = 90, 100, not all 5 eigenvalues converged successfully, so the eigenvector reconstruction 
finished quicker, leading to apparently shorter CPU times (*). 



M 


N 


Cwi 


Arpack- Pardiso 






time 


mem. 


time 


mem. 


30 


27'000 


21 


0.01 


9 


0.08 


40 


64'000 


300 


0.02 


46 


0.28 


50 


125'000 


1'246 


0.04 


157 


0.68 


60 


216'000 


4'748 


0.07 


495 


1.49 


70 


343'000 


15'100 


0.11 


1'309 


3.00 


80 


512'000 


39'432 


0.16 


3'619 


5.12 


90 


729'000 


97'119* 


0.23 


7'909 


8.70 


100 


I'OOO'OOO 


255'842* 


0.32 


20'239 


14.34 



Instead, we switch to the iLUPACK-based preconditioner that is also based on 
symmetric weighted matchings and in addition uses inverse-based pivoting. In par- 
ticular, for our experiments we use k = 5 as bound for the norm of the inverse 
triangular factor and Amd for the symmetric reordering. We also tried to use Metis 
but for this particular matrix problem we find that Amd is clearly more memory 
efficient. Next we compare PARDiSO-based shift-and-invert Lanczos (Arpack) with 
that using Ilupack and the simplified QMR as inner iterative solver. Here we use 
s — \/sqrt{N) with aggressive dropping and the QMR method is stopped once the 
norm of residual satisfies \\Ax — b\\ ^ 10^^"]]6]]. In order to illustrate the benefits of 
using symmetric weighted matchings we also tried Ilupack without matching, but 
the numerical results are disappointing as can be seen from the \s in Table Wi?^ We 
emphasize that the multi-level approach is crucial, a simple use of incomplete fac- 
torization methods without multi-level preconditioning [38] does not give the desired 
results. Besides the effect of matchings we also compare how the performance of the 
methods changes when varying the value w from the critical value w — Wc — 16.5 to 
w = 12.0 and w = 21.0. We find that these changes do not affect the sparse direct 
solver at all while the multilevel ILU significantly varies in its performance. Up to 
now our explanation for this effect is the observation that with increasing w the di- 
agonal dominance of the system also increases and the Ilupack preconditioner gains 
from higher diagonal dominance. As we can see from Table Ilupack still uses 
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significantly less memory than the direct solver Pardiso for all values of w and it 
is the only method we were able to use for larger N due to the memory constraints. 
Also, the computation time is best. 

Table 8.2 

CPU times in seconds and memory requirements in GB to compute five eigenvalues closest 
to X = of an Anderson matrix of size X A^^ with Arpack-Pardiso, Arpack-Ildpack, and 
Arpack-Ilupack-Symmatch. The symbol ' — ' indicates that a memory consumption was larger 
than 25 GB and 'f ' indicates memory problems with respect to the fill-in. 



M 


W 






Arpack 










Pardiso 


Ilupack 


Ilupack- Symmatch 






time 


mem. 


time 


mem. 


time 


mem. 


70 


12.0 


1'359 


3.00 


5'117 


1.09 


2'140 


0.95 


100 


12.0 


20'639 


14.34 


39'222 


5.62 


13'583 


3.20 


130 


12.0 






t 


t 


65722 




70 


16.5 


1'305 


3.00 


504 


0.33 


477 


0.31 


100 


16.5 


20'439 


14.34 


2'349 


0.95 


2'177 


0.89 


130 


16.5 






6'320 


2.09 


6'530 


1.95 


160 


16.5 






23'663 


3.95 


13'863 


3.63 


70 


21.0 


1'225 


3.00 


371 


0.22 


310 


0.22 


100 


21.0 


20'239 


14.34 


r513 


0.64 


r660 


0.65 


130 


21.0 






3'725 


1.41 


3'527 


1.44 


160 


21.0 






15'302 


2.63 


20'120 


2.68 



When using preconditioning methods inside shift-and-invert Lanczos we usually 
have to solve the inner linear system for A — 01 up to the machine precision to make 
sure that the eigenvalues and eigenvectors are sufficiently correct. In contrast to this 
the Jacobi-Davidson method allows to solve the associated correction equation less 
accurately and only when convergence takes place a more accurate solution is required. 
In order to show the significant difference between the iterative parts of Arpack and 
Jacobi-Davidson we state the number of iteration steps in Table If we were to 
aim for more eigenpairs, we expect that eventually the Jdbsym becomes less efficient 
and should again be replaced by Arpack. 

In the sequel we compare the traditional Cwi method with the Jacobi-Davidson 
code Jdbsym [33] using Ilupack as preconditioner. Table shows that switching 
from Arpack to Jacobi-Davidson in this case improves the total method by an- 
other factor 6 or greater. For this reason Jacobi-Davidson together with Ilupack 
will be used as default solver in the following. The numerical results in Table show 
a dramatic improvement in the computation time by using iLUPACK-based Jacobi- 
Davidson. Although this new method slows down for smaller w due to poorer diag- 
onal dominance, a gain by orders of magnitude can still be observed. For w = 16.5 
and larger, even more than three orders of magnitude in the computation time can 
be observed. Hence the new method drastically outperforms the Cwi method while 
the memory requirement is still moderate. 

One key to the success of the preconditioner is based on the threshold k which 
bounds the growth of L^^. Already for a small example such as M = 70 significant 
differences can be observed. As we show in Table IITKl increasing the bound by a factor 
2 from K = 5 up to K = 10 and k = 20 leads to an enormous increase of fill. Here 
we measure the fill of the incomplete LDL^ factorization relative to the non-zeros of 
the original matrix. By varying the drop tolerance e we also see that the dependence 
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Table 8.3 

Number of inner/outer interaction steps inside ARPACKand J ACOBI- Davidson. The symbol ' — ' 
indicates that the computations were not performed anymore for Arpack. 



M 


W 




Ilupack-Symmatch 










Arpack 


Jacobi-Davidson 






outer 


total 


inner 
average 


outer 


total 


inner 
average 


70 


12.0 


42 


871 


20.7 


43 


246 


5.7 


100 


12.0 


43 


1101 


25.6 


44 


325 


7.4 


130 


12.0 


42 


1056 


25.1 


44 


274 


6.2 


70 


16.5 


43 


611 


14.2 


41 


200 


4.88 


100 


16.5 


43 


857 


19.9 


43 


231 


5.37 


130 


16.5 


42 


1058 


25.2 


38 


313 


8.24 


160 


16.5 


42 


968 


23.1 


41 


276 


6.73 


190 


16.5 








39 


339 


8.69 


220 


16.5 








40 


433 


10.82 


250 


16.5 








47 


652 


13.87 


70 


21.0 


43 


585 


13.60 


40 


200 


5.00 


100 


21.0 


42 


1004 


23.90 


42 


301 


7.17 


130 


21.0 


44 


914 


20.77 


39 


274 


7.03 


160 


21.0 


25 


896 


35.84 


43 


507 


11.79 


190 


21.0 








46 


637 


13.85 


220 


21.0 








41 


855 


20.85 


250 


21.0 








43 


891 


20.72 



on K is much more significant than the dependence of e. Roughly speaking, the ILU 
decomposition becomes twice as expensive when k, is replaced by 2k and so does the 
fill-in. The latter is crucial since memory constraints severely limit the size of the 
application that can be computed. 

In Table 18.61 we show how Jdbsym and Ilupack-Symmatch perform when in- 
stead of periodic boundary conditions, we use hard wall boundaries, i.e., xo;j;fc = 
Xi-fl-k = Xi-j-fi = XM+i-j-k = Xi-M+i-k = Xi-j-M+1 = for aU i,j,k. This is sometimes 
of interest in the Anderson problem and generally, it is expected that for large M, 
the difference in eigenvalues and eigenvectors becomes small when compared to the 
standard periodic boundaries. In addition, we also show results for the so-called off- 
diagonal Anderson problem [14]. Here, we shift the diagonal to a constant a = 1.28 
and incorporate the randomness by setting the off-diagonal elements of A to be uni- 
formly distributed in [—1/2, 1/2]. The graph of the matrix A remains the same. These 
values correspond — similarly to Wc — 16.5 used before for purely diagonal random- 
ness — to the physically most interesting localization transition in this model [14]. 
We note that using hard wall boundary conditions instead of periodic boundary con- 
ditions leads to slightly less fill but increases the number of iteration steps as can be 
seen in Table This conclusions carries over to the off-diagonal Anderson problem, 
where the memory consumption is less but the iterative part takes even longer. In 
principle our results could be improved if we were to switch to a smaller threshold e 
than the here uniformly applied e — l/sqrt{N). 
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Table 8.4 

CPU times in seconds and memory requirements in GB to compute five eigenvalues closest to 
A = with Cwi and .Jacobi-Davidson using Ilupac'K-Symmatch for the shift- and-invert technique. 
% ' indicates that the convergence of the method was too slow. For Cwi and M = 100, not all 5 
eigenvalues converged successfully, so the eigenvector reconstruction finished quicker, leading to 
variances in the CPU times (*). 



M 


W 


Cwi 


Jacobi-Davidson 










Ilupack-Symmatch 






time 


mem. 


time 


mem. 


70 


12.0 


20'228 


0.11 


1'314 


0.95 


100 


12.0 


148'843 


0.32 


8'522 


2.93 


130 


12.0 


t 


t 


56'864 


8.06 


70 


16.5 


15'100 


0.11 


258 


0.34 


100 


16.5 


255'842* 


0.32 


978 


0.98 


130 


16.5 


t 


t 


2'895 


2.16 


160 


16.5 


t 


t 


5'860 


4.05 


190 


16.5 


t 


t 


16'096 


6.75 


220 


16.5 


t 


t 


30'160 


10.58 


250 


16.5 


t 


t 


86'573 




70 


21.0 


14'371 


0.11 


255 


0.26 


100 


21.0 


331'514* 


0.32 


1'134 


0.75 


130 


21.0 


t 


t 


1'565 


1.65 


160 


21.0 


t 


t 


4'878 


3.08 


190 


21.0 


t 


t 


11'032 


5.22 


220 


21.0 


t 


t 


28'334 




250 


21.0 


t 


t 


48'223 





Table 8.5 

The influence of the inverse bound n on the amount of memory. For M = 70 compare for differ- 
ent thresholds how the fill-in nnz(LDL^)/nnz{A) varies depending on k and state the computation 
time in seconds. 



e 


K = 5 

fill time total 
LDL'^ time 


K = 10 

fill time total 
LDL'^ time 


K = 20 
fill time total 
LDL'^ time 


0.01 
0.005 
0.0025 
0.001 


5.4 3.7-10^ 8.7-102 
6.8 5.4 -IQi 4.4-102 
8.6 8.1 -10^ 3.1-102 
11.7 1.3-102 3.0-102 


8.7 6.7-10^ 5.0-102 
11.0 1.0-102 3.8-102 
13.8 1.5 - 102 3.6 - 102 
18.0 2.3 - 102 4.1 ■ 102 


15.2 1.6-102 4.8-102 
19.1 2.3-102 5.0-102 
24.1 3.4-102 6.0-102 
32.1 5.4-102 7.8-102 



9. Conclusion. We have shown that modern approaches to preconditioning 
based on symmetric matchings and multilevel preconditioning methods lead to an as- 
tonishing increase in performance and available system sizes for the Anderson model 
of localization. This approach is not only several orders of magnitudes faster than the 

traditional Cwi approach, it also consumes only a moderate amount of memory thus 
allowing to study the Anderson eigenproblem for significantly larger scales than ever 
before. 

Let us briefly recall the main ingredients necessary for this progress: At the heart 
of the new approach lies the use of symmetric matchings [38] in the preconditioning 
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Table 8.6 

Difference in performance for our standard problem with periodic boundary conditions, the 
problem with hard wall conditions, and the inverse problem with random numerical entries in the 
off-diagonal elements. Memory requirement (in GB) and CPU times (in seconds) to compute at the 
transition the eigenvectors corresponding to the five eigenvalues closest to A = with shift- and-invert 
Jacobi- Davidson and the Ilupack-Symmatch solver using symmetric weighted matchings. 



N 


Periodic 


Hard wall 


Inverse 




time 


memory 


time 


memory 


time 


memory 


70 


258 


0.34 


282 


0.31 


457 


0.27 


100 


980 


0.98 


969 


0.94 


2'075 


0.79 


130 


5'244 


2.16 


2'090 


2.07 


6'472 


1.76 


160 


9'958 


4.05 


5'661 


3.90 


11'975 


3.30 


190 


14742 


6.75 


13'431 


6.62 


27'488 





stage of the inverse-based incomplete factorization preconditioning iterative method 
[9]. Furthermore, the preconditioning itself is of a multi-level type, complementary 
to the often used full-pivoting strategies. Next, the inverse-based approach is also of 
paramount importance to keep the fill-in at a manageable level (see Table I^SJ. And 
last, we emphasize that these results, of course, reflect our selected problem class: 
to compute a few of the interior eigenvalues and associated eigenvectors for a highly 
indefinite symmetric matrix defined by the Anderson model of localization. 

The performance increase by several orders of magnitude (see Table HOjl is solely 
due to our use of new and improved algorithms. Combined with advances in the 
performance to cost ratio of computing hardware during the last 6 years period, 
current preconditions methods makes it possible to solve those problems quickly and 
easily which have been considered by far too large until recently [13]. Even for iV x iV 
matrices as large as = 64 • 10^, it is now possible within a few days to compute the 
interior eigenstates of the Anderson problem. 

The success of this method indicates that it might also be successfully applied to 
other large-scale problems arising in (quantum) physics. 
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